How to make your asymmetric multiprocessor design OS and CPU independent

How to make your asymmetric multiprocessor design OS and CPU independent
By Francis St. Amant , Embedded.com
45: 22 2005 (12:42 PM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=175007215

Even as semiconductor companies tend toward multiprocessing solutions in many network, embedded consumer and mobile designs, the RTOSes and development tools are still rushing to catch up with the shift.

Most multiprocessing-capable operating systems support one type of processor at a time and are often limited to symmetric multiprocessing (SMP). While SMP provides a relatively straightforward path from single processing, it can only be used for homogeneous configurations with shared resources.

But most embedded applications are at least partially asymmetric in nature. And asymmetric multiprocessor (AMP) systems require multiple instantiations of the same or different operating systems running on different cores.

In a system with different OSes or a heterogeneous multicore system, implementing a common middleware structure to manage communications between the processors comes down to bridging between the different OSes and/or processor types.

Many of the first generation multiprocessor designs currently in consumer embedded systems, fortunately, are not complicated and it has been possible to incorporate simple, often hand-coded, communications mechanisms to synchronize operation. But such designs will not scale well and are not portable to the next generation of three, four, six and eight core configurations that are emerging.

What is needed is a way to organize all the system software modules necessary for inter-core communications into a framework transparent to the operating system or platform (hardware and logical).

Achieving multicore communications transparency
Communications transparency provided at the application level, however, through the use of an InterProcessor Communications Framework (IPCF) using messages to shield the application from the hardware, will allow a systems designer to bridge different processor types running different OSes as well as support multiple APIs, using plug ins to adapt to different development environments.

Incorporating an object-oriented architecture and written in ANCI C, the IPCF uses multilevel model similar to the OSI networking system. This has the advantage that any of the layers can be modified without effect on the other layers. It is also processor and interconnect independent and OS transparent, allowing it to be ported to a wide range of OSes. It is also flexible and extensible enough to adapted to varying multiprocessor design requirements.

As shown below, the interprocessor communications framework allows the use of a traditional Socket-like API to provide a familiar and easy to use interface for the application programmer. The code shown is an excerpt from a demo program illustrating 8 balls/processes interacting through the framework.

Within the framework, an application runs at the top level (see Figure 1 below) with service plug-ins handling the interface between the application and the core of the IPCF. A set of API calls, or other service functionality can be implemented in these service plug-ins to provide the application with access to the multi-processor features. The core of the IPCF consists of repositories, dispatchers and driver managers. At the bottom are the low level drivers that interface with the hardware.

FIGURE 1(Source: Polycore)

Writing the framework drivers
Writing IPCF drivers only requires modifications of six functions (although more functions may be used if needed):

Init (same for both receive and send)
HWSetup (same for both receive and send)
Sync (same for both receive and send)
Write (only used for send)
Read (only used for receive)
ISR (how to handle interrupts)

These are the generic names. The specific names of the functions are derived from the OUTPORTDATATYPE and INPORTDATATYPE elements in the LPF (driver project file) file. If, for example, the OUTPORTDATATYPE element is defined as follows:

(OUTPORTDATATYPE)x86LinkDriver(/OUTPORTDATATYPE)

then the names of the functions will be:

x86LinkDriver_Init()
x86LinkDriver_HWSetup()
x86LinkDriver_Sync()
x86LinkDriver_Write()
x86LinkDriver_Read()
x86LinkDriver_ISR()

The service plug-ins can be used to implement a high-level development environment for the application. For example, API functionality like signaling using semaphores, or data passing through software FIFOs, can be provided to the application developer. A service function call will internally use a Poly-Messenger repository to implement its functionality. Each repository has a unique identifier in the multi-core system, making it possible to refer to it from code running on any processor in the system.

When the service called refers to a repository on the local node, the requested action can be executed immediately. However, when the repository that is referred to, is situated on another processor in the system, then a dispatcher will be called to send the command to the remote processor.

The dispatcher will determine the route to the destination node. Different dispatchers are used for different types of interconnect between the processors, to provide dispatchers optimized for the specific hardware. After deciding the route, the dispatcher passes the command on to the driver manager, which will start the transmission to the next node.

In a system with a large number of processors, not all processors are connected to each other. In cases where a message must be passed through intermediate processors, the framework manages this process, and if configured to use parallel routes, the framework splits the data and reconstructs it at the destination, and provides support for handling out-of-order arrival.

Achieving processor independence
Because of its object oriented structure and the fact that it is written entirely in ANSI-C, the IPCF can be ported to any processor with an ANSI-C compiler, which is just about every processor on the market. And since C code can be compiled to custom logic using hardware-software co-design tools, it's possible to use the ICPF to interface code running on a processor with custom logic.

The fact that the services are plug-ins and also can be written in ANSI-C makes them portable, too. During system design, a set of API or service calls can be made standard for the whole system, the plug-in written and ported to all processors in the system, which results in a uniform system-wide development model. Or, if there are no specific demands for the API, a pre-written service plug-in can be used on all processors, too.

In addition, communication problems between 16-bit and 32-bit processors or between processors with different “endianness” can be handled in the service plug-in and in the driver.

Interconnect independence
Drivers that can be plugged into the framework can be written for custom hardware, or, when developing on a system with existing drivers, it's easy to write a plug-in to allow Poly-Messenger to use these drivers.

Through the use of a configuration file or configuration tool, the system designer can specify the structure of the processor network. At runtime, the drivers are called automatically by IPCF. When the application makes a service call that accesses a remote repository, the message will be passed to a dispatcher that decides which driver must be used to send the message to the destination. The driver is selected based on the description in the configuration file.

The driver project file specifies the four main properties of a IPCF driver:

Header files – typically one for each direction, i.e. read/write
Data types – both directions
Init function name – both directions
Parameter data types – both directions

To see how these properties are configured, let’s examine a sample lpf file. First start with the main project file (mpf extension). The driver and dispatcher XML files (lpf and dpf) are included as follows:

The parameter INCLUDELINKDRIVER specifies the XML file that in turn describes the driver, “pipelink.lpf” in the ball example. The LPF file specifies details about the driver, as in the example below:

Making your multiprocessor system OS transparent
Because system Services are plug-ins into the IPCF, the actual API that is used for inter-processor communications can be adapted to the needs of the developer. If the developer is using an existing OS API, a plug-in implementing the same or a similar API can be written.

This new API, being a plug-in into the framework has the added advantage of being multi-processor aware. Additionally, it can be ported to other processors in the same system, even if these run a different OS, for example Linux or Symbian. In small systems, the IPCF can also work without an operating system, using only the compiler. In that case, the API calls or services that are used, is called directly from main() or its sub-functions.

Because of its use of an OSI-like layered networking structure, routing through complicated static multiprocessor networks can be handled easily by the IPCF’s dispatchers. Different dispatchers handle different types of processor communications.

In a network of TigerSHARCs, for example, the link ports form a high-bandwidth, low overhead communication medium. Link ports are point-to-point communications. Other dispatchers take care of bus-style communications, where all processors are connected to a shared bus.

The layered structure of the IPCF allows software designers to assign subnets in their system, useful where there is more than one way of communicating among multiple processing elements. For example, in a wireless base station using multiple PowerPCs and TigerSHARCs (Figure 2 below), a subnet using point-to-point link ports is assigned for each board. Here, one extra subnet is assigned to the bus communications between the PowerPC and the first TigerSHARC on each board.

FIGURE 2 (Source: Polycore)

This first processor on each board then acts as a gateway between the subnet local to the board, and the subnet that spans the bus. If another processor on the local subnet needs to pass data over the bus, the framework will first send this data to the gateway processor, which will then take care of the bus transfer.

If the board allows it, it's also possible that more than one processor is a gateway to the bus subnet. In this case, routing on the internal board to the gateway processors can be optimized.

Applying the framework to a heterogeneous AMP design
In the network base station example show in Figure 2, above, while large numbers of low power DSPs can be fitted onto a rack without extra cooling requirements, it makes the software design somewhat problematical.

In this case study, the base station consists of a number of boards with TigerSHARC DSPs, connected to a PowerPC situated on a separated board. Communications between the TigerSHARCs are implemented using the TigerSHARC link ports, a flexible means of point-to-point communication. The boards are fitted in a CompactPCI rack, which provides the bus for inter-board communications.

The IPCF configured here reflects the two kinds of interconnects used in the design. First, each TigerSHARC board is a subnet on its own: communications between the processors on the same board use the link ports. Second, in order to go off-board, each board has one gateway processor (labeled “1” on the diagram).

This processor is the only one on the board that has access to the CompactPCI bus. When any of the other processors needs to communicate with another board, or with the PowerPC, the IPCF will make use of the link ports to send the data to the gateway processor, which will initiate the PCI transfer.

Analog Devices provides the Visual DSP Kernel (VDK) with its compiler and IDE. Since the base station is multi-channel, VDK is the obvious choice to implement multi-tasking on each TigerSHARC core.

On top of VDK, a service plug-in interfaces the application to the framework for inter-processor communications. This service plug-in can be implemented to suit the needs of the application. At the bottom, drivers can be written using the VisualDSP hardware access routines.

On the PowerPC, an operating system of choice can be selected. VxWorks and Linux are common choices for this processor, both having their advantages and disadvantages. In this case study, Linux is used.

On top of Linux, compiling the same service plug-in as on the TigerSHARC processors provides a uniform development environment for the whole system. A driver interfaces with the Linux PCI driver system, which would be available in source code.

The framework, its service plug-in and the application parts that access the IPCF services, however, would be compiled as Linux application level software, and remain proprietary source code.

During development, if a Linux process written in C uses too much processor power, it can be recompiled for one of the TigerSHARCs, because the same API calls are available. If a next generation base station uses more than one PowerPC board, some TigerSHARC tasks can be recompiled to run on the extra processors. If a next generation system would use entirely different hardware, it's possible to recompile the API plug-ins for that system, and the application can be ported with minimal effort.

Francis St. Amant is Engineering Manager at Polycore Software Inc.

To read more technical insights and how to articles on multiprocessing and multicore designs, go to More about multicores, multiprocessing and tools.